Skip to content

Conversation

@dot-agi
Copy link
Member

@dot-agi dot-agi commented May 16, 2025

📥 Pull Request

📘 Description
Sets the context in the OpenAI instrumentor from the OpenAI Agents SDK instrumentor.
Closes #974

🧪 Testing
Tested with the customer service agent example and the demo repo.

@dot-agi dot-agi requested review from Dwij1704, areibman and Copilot May 16, 2025 22:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request fixes duplicate LLM calls for the Agents SDK by adding special handling for "ResponseSpanData" and integrating custom wrappers into the OpenAI instrumentation. Key changes include:

  • Adding context propagation for ResponseSpanData in exporter.py.
  • Wrapping the Responses API calls in instrumentor.py with custom wrappers to leverage the Agents SDK trace context.
  • Unwrapping the custom wrappers when uninstrumenting, with enhanced debug logging in both files.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
agentops/instrumentation/openai_agents/exporter.py Adds special handling for ResponseSpanData to propagate trace context
agentops/instrumentation/openai/instrumentor.py Introduces custom wrappers for both synchronous and asynchronous responses and handles unwrapping for the custom instrumentation
Comments suppressed due to low confidence (1)

agentops/instrumentation/openai_agents/exporter.py:319

  • The variable 'span_id' is used without being defined. Please ensure 'span_id' is properly assigned before this line or update the reference if a different variable should be used.
ctx = context_api.set_value("openai_agents.span_id", span_id, ctx)

@codecov
Copy link

codecov bot commented May 16, 2025

Codecov Report

Attention: Patch coverage is 58.92857% with 46 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
agentops/instrumentation/openai/instrumentor.py 62.50% 36 Missing ⚠️
agentops/instrumentation/openai_agents/exporter.py 33.33% 10 Missing ⚠️

📢 Thoughts on this report? Let us know!

@dot-agi dot-agi requested a review from Copilot May 16, 2025 23:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes duplicate LLM calls by setting the context in the OpenAI instrumentor using information from the OpenAI Agents SDK. Key changes include adding tests to verify context propagation in custom wrappers, propagating trace context within the exporter for ResponseSpanData, and updating the instrumentor to use custom wrappers for responses.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/unit/instrumentation/openai_core/test_custom_wrappers.py Added unit tests to ensure the custom responses wrappers correctly set OpenAI Agents SDK context.
agentops/instrumentation/openai_agents/exporter.py Introduced trace context propagation for ResponseSpanData to prevent duplicate spans and calls.
agentops/instrumentation/openai/instrumentor.py Updated the instrumentor to wrap and unwrap responses using custom wrappers with added logging.
Comments suppressed due to low confidence (1)

agentops/instrumentation/openai_agents/exporter.py:319

  • The variable 'span_id' is used here but not defined in this scope. Ensure 'span_id' is properly retrieved or assigned before being used.
ctx = context_api.set_value("openai_agents.span_id", span_id, ctx)

@areibman
Copy link
Contributor

areibman commented May 17, 2025

image Still seeing duplicates in the tool_example notebook

@dot-agi dot-agi requested a review from Copilot May 19, 2025 19:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes duplicate LLM calls in the Agents SDK by updating how the OpenAI responses instrumentation is applied. Key changes include:

  • Switching from the standard wrap/unwrap to using custom wrappers via wrap_function_wrapper.
  • Updating and extending tests for both synchronous and asynchronous response instrumentation.
  • Adding special handling in the OpenAI Agents exporter to propagate trace context from response spans.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/unit/instrumentation/openai_core/test_instrumentor.py Updated tests to verify the use of custom wrappers with wrap_function_wrapper
tests/unit/instrumentation/openai_core/test_custom_wrappers.py Added tests to ensure custom wrappers correctly handle context and span attributes
agentops/instrumentation/openai_agents/exporter.py Added logic for processing ResponseSpanData and propagating trace context
agentops/instrumentation/openai/instrumentor.py Updated _instrument and _uninstrument to use custom wrappers and provide fallback logging
Comments suppressed due to low confidence (1)

agentops/instrumentation/openai_agents/exporter.py:319

  • The variable 'span_id' is used without being defined. Consider retrieving the span identifier from the span (similarly to how 'trace_id' and 'parent_id' are obtained) to ensure proper context propagation.
ctx = context_api.set_value("openai_agents.span_id", span_id, ctx)

@areibman
Copy link
Contributor

@the-praxs The original ticket also mentions tool calls. Please either do that in this branch or make a new ticket :)

As for the review-- this is fairly complicated. @Dwij1704 has some more insights on the right method here.

@dot-agi
Copy link
Member Author

dot-agi commented May 20, 2025

@the-praxs The original ticket also mentions tool calls. Please either do that in this branch or make a new ticket :)

As for the review-- this is fairly complicated. @Dwij1704 has some more insights on the right method here.

Will do it here

@areibman
Copy link
Contributor

@the-praxs Another major issue here is that responses don't seem to be properly instrumented. Run any example OpenAI Agents and you'll see. Agents uses the Responses API, and for whatever reason, we don't catch prompts. Here's an example JSON we get in the AgentOps dashboard. Notice there's no prompt (user or system), only a completion.

{
  "span_id": "99f46cd24e9df5ea",
  "parent_span_id": "cdcfbaa2b5798726",
  "span_name": "openai.responses.create",
  "span_kind": "Client",
  "span_type": "request",
  "service_name": "agentops",
  "start_time": "2025-05-11T04:26:18.965974",
  "end_time": "2025-05-11T04:26:38.825378",
  "duration": 19859404000,
  "status_code": "OK",
  "status_message": "",
  "attributes": {},
  "resource_attributes": {
    "ProjectId": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",
    "agentops.project.id": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",
    "cpu.count": "10",
    "cpu.percent": "22.3",
    "host.machine": "arm64",
    "host.name": "f3edd163dca2",
    "host.node": "macbookpro.lan",
    "host.os_release": "24.0.0",
    "host.processor": "arm",
    "host.system": "Darwin",
    "host.version": "Darwin Kernel Version 24.0.0: Tue Sep 24 23:39:07 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6000",
    "imported_libraries": "[\"openai\",\"json\",\"agentops\",\"asyncio\"]",
    "memory.available": "4799184896",
    "memory.percent": "72.1",
    "memory.total": "17179869184",
    "memory.used": "6705627136",
    "os.type": "linux",
    "service.name": "agentops"
  },
  "span_attributes": {
    "gen_ai": {
      "completion": [
        {
          "0": {
            "content": "I’m locking Tauros into Choice Band Close Combat.\n\nWith 225 Atk and 1.5× Choice Band boost, Close Combat’s 120 BP STAB + 2× against Ogerpon’s Grass/Fire typing will reliably OHKO without ever missing—whereas Stone Edge, while stronger on paper, is 80 accuracy and a miss would let Ogerpon fire back first.",
            "id": "rs_682026ebd87481919b5840f896eb180709a776b3e68b5a1d",
            "type": "output_text"
          }
        },
        {
          "1": {
            "finish_reason": "completed",
            "id": "msg_682026fda6e88191be74ca810132bc7109a776b3e68b5a1d",
            "role": "assistant",
            "type": "message"
          }
        }
      ],
      "request": {
        "model": "o4-mini-2025-04-16",
        "temperature": "1",
        "top_p": "1"
      },
      "response": {
        "id": "resp_682026eb4d2c8191bf1f088e1e3dd9d609a776b3e68b5a1d",
        "model": "o4-mini-2025-04-16"
      },
      "usage": {
        "cache_read_input_tokens": "0",
        "completion_tokens": "2713",
        "prompt_tokens": "2705",
        "reasoning_tokens": "2624",
        "total_tokens": "5418"
      }
    },
    "instrumentation": {
      "name": "agentops",
      "version": "0.4.10"
    },
    "library": {
      "name": "openai",
      "version": "1.78.0"
    }
  },
  "event_timestamps": [],
  "event_names": [],
  "event_attributes": [],
  "link_trace_ids": [],
  "link_span_ids": [],
  "link_trace_states": [],
  "link_attributes": [],
  "metrics": {
    "total_tokens": 8042,
    "prompt_tokens": 2705,
    "completion_tokens": 2713,
    "cache_read_input_tokens": 0,
    "reasoning_tokens": 2624,
    "success_tokens": 8042,
    "fail_tokens": 0,
    "indeterminate_tokens": 0,
    "prompt_cost": "0.0029755",
    "completion_cost": "0.0119372",
    "total_cost": "0.0149127"
  }
}```

@dot-agi
Copy link
Member Author

dot-agi commented May 20, 2025

@the-praxs Another major issue here is that responses don't seem to be properly instrumented. Run any example OpenAI Agents and you'll see. Agents uses the Responses API, and for whatever reason, we don't catch prompts. Here's an example JSON we get in the AgentOps dashboard. Notice there's no prompt (user or system), only a completion.


{

  "span_id": "99f46cd24e9df5ea",

  "parent_span_id": "cdcfbaa2b5798726",

  "span_name": "openai.responses.create",

  "span_kind": "Client",

  "span_type": "request",

  "service_name": "agentops",

  "start_time": "2025-05-11T04:26:18.965974",

  "end_time": "2025-05-11T04:26:38.825378",

  "duration": 19859404000,

  "status_code": "OK",

  "status_message": "",

  "attributes": {},

  "resource_attributes": {

    "ProjectId": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",

    "agentops.project.id": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",

    "cpu.count": "10",

    "cpu.percent": "22.3",

    "host.machine": "arm64",

    "host.name": "f3edd163dca2",

    "host.node": "macbookpro.lan",

    "host.os_release": "24.0.0",

    "host.processor": "arm",

    "host.system": "Darwin",

    "host.version": "Darwin Kernel Version 24.0.0: Tue Sep 24 23:39:07 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6000",

    "imported_libraries": "[\"openai\",\"json\",\"agentops\",\"asyncio\"]",

    "memory.available": "4799184896",

    "memory.percent": "72.1",

    "memory.total": "17179869184",

    "memory.used": "6705627136",

    "os.type": "linux",

    "service.name": "agentops"

  },

  "span_attributes": {

    "gen_ai": {

      "completion": [

        {

          "0": {

            "content": "I’m locking Tauros into Choice Band Close Combat.\n\nWith 225 Atk and 1.5× Choice Band boost, Close Combat’s 120 BP STAB + 2× against Ogerpon’s Grass/Fire typing will reliably OHKO without ever missing—whereas Stone Edge, while stronger on paper, is 80 accuracy and a miss would let Ogerpon fire back first.",

            "id": "rs_682026ebd87481919b5840f896eb180709a776b3e68b5a1d",

            "type": "output_text"

          }

        },

        {

          "1": {

            "finish_reason": "completed",

            "id": "msg_682026fda6e88191be74ca810132bc7109a776b3e68b5a1d",

            "role": "assistant",

            "type": "message"

          }

        }

      ],

      "request": {

        "model": "o4-mini-2025-04-16",

        "temperature": "1",

        "top_p": "1"

      },

      "response": {

        "id": "resp_682026eb4d2c8191bf1f088e1e3dd9d609a776b3e68b5a1d",

        "model": "o4-mini-2025-04-16"

      },

      "usage": {

        "cache_read_input_tokens": "0",

        "completion_tokens": "2713",

        "prompt_tokens": "2705",

        "reasoning_tokens": "2624",

        "total_tokens": "5418"

      }

    },

    "instrumentation": {

      "name": "agentops",

      "version": "0.4.10"

    },

    "library": {

      "name": "openai",

      "version": "1.78.0"

    }

  },

  "event_timestamps": [],

  "event_names": [],

  "event_attributes": [],

  "link_trace_ids": [],

  "link_span_ids": [],

  "link_trace_states": [],

  "link_attributes": [],

  "metrics": {

    "total_tokens": 8042,

    "prompt_tokens": 2705,

    "completion_tokens": 2713,

    "cache_read_input_tokens": 0,

    "reasoning_tokens": 2624,

    "success_tokens": 8042,

    "fail_tokens": 0,

    "indeterminate_tokens": 0,

    "prompt_cost": "0.0029755",

    "completion_cost": "0.0119372",

    "total_cost": "0.0149127"

  }

}```

I ran all the notebooks and checked the spans completely. I had both the prompts and completions in the LLM calls.

I cannot reproduce this issue but let me see if there's something wonky.

@dot-agi
Copy link
Member Author

dot-agi commented May 20, 2025

Cannot reproduce the issue you mentioned. Here are the trace IDs for each of the notebooks and they have the data as intended -

  • Web search example: b3b938ad5b00922f67313155ce2ecd5c
  • Customer service: 810722bcf718821cd8f0043747906ce2
  • Agent workflow: 691151a51c4e9e78a6cd13d55f44cdb1

@dot-agi
Copy link
Member Author

dot-agi commented May 21, 2025

Closing this since #987 makes a core change.

@dot-agi dot-agi closed this May 21, 2025
@dot-agi dot-agi deleted the fix/duplicate-agents-llm-calls branch May 21, 2025 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: OpenAI Agents SDK is not correctly instrumenting calls

4 participants